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<^ \ Abstract 

Evaluation of HIV large scale interventions programme is becoming increasingly important, but 
impact estimates frequently hinge on knowledge of changes in behaviour such as the frequency of 
condom use (CU) over time, or other self -reported behaviour changes, for which we generally have 
limited or potentially biased data. We employ a Bayesian inference methodology that incorporates 
a dynamic HIV transmission dynamics model to estimate CU time trends from HIV prevalence 
■ data. Estimation is implemented via particle Markov Chain Monte Carlo methods, applied for the 

£ — , . first time in this context. The preliminary choice of the formulation for the time varying parameter 

reflecting the proportion of CU is critical in the context studied, due to the very limited amount 
of CU and HIV data available We consider various novel formulations to explore the trajectory of 
CU in time, based on diffusion-driven trajectories and smooth sigmoid curves. Extensive series of 
numerical simulations indicate that informative results can be obtained regarding the amplitude of 
the increase in CU during an intervention, with good levels of sensitivity and specificity performance 
in effectively detecting changes. The application of this method to a real life problem illustrates how 
it can help evaluate HIV intervention from few observational studies and suggests that these methods 
can potentially be applied in many different contexts. 



1 Introduction 

Significant resources are being committed to implement large-scale interventions against infectious dis - 



eases, such as HIV/ AIDS that killed an estimated two million individuals in 2008 dUNAIDS et aU l2009). 
Although such interventions are implemented on a large scale because they are expected to work, in- 
creasing attention is given to the evaluation of these large-scale intervention programmes to understand 
what still needs to be done to control the epidemic and eventually achieve elimination, ensuring that 
resources are not waisted on strategies that do not work. 

Even if antiretroviral therapy has become an important component of large scale prevention interven- 
tions, condom use and circumcision remain important strategies for reducing HIV transmission. While 
there are difficulties in estimating condom use trends accurately, due to biases inherent in self-reported 
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behaviour (ITurner and Miller!. 1 1997; Zenilma n et al.l.ll995l : lHanck et all 12008b . its average level closely 
determines the spread of HIV dBoily et al. . 2007 ). Thus, it is important to assess if trends in epidemio- 
logical data such as HIV prevalence can be used to infer the impact of interventions on risk behaviours 
that are susceptible to self-reported bias. This is motivated by the fact that directly observed quantities 
as HIV prevalence do not provide straightforward indications on the impact of an intervention. Indeed, 
an epidemic has an intrinsic dynamic, which can cause the prevalence to grow although an efficient in- 
tervention is being led if the intervention is introduced early in an epidemic. Alternatively, in a mature 
epide mic the prevalence can decrease even though on-going interventions are inefficient (IBoily et all 
l2002h . However, the trajectory of CU over time, and especially since the beginning of a prevention pro- 
gramme, can shed light on the impact of the intervention and on the future trajectory of the epidemic. In 
this light, we apply a Bayesian methodology to trends in HIV prevalence data, focusing on the specific 
example of Avahan, India AIDS initiative, a large-scale HIV/ AIDS intervention targeted to high risk 
groups. 

The Avahan intervention was motivated by high levels of HIV prevalence amongst high-risk groups 
observed in southern India (typically > 20%) (Ramesh, et al., 2008), which lead to concerns about infec- 
tions bridging to their long-term partners and the general population. The programme was launched by 
the Bill & Melinda Gates Foundation in 2003 (BMGF, 2008), and has targeted high-risk groups for HIV 
infection, in particular female sex workers (FSWs), by promoting and distributing free condoms. Dif- 



feren t studies have been conducted to examine the impact of Avah an (IBoily et all 120071 : Peering et al 



120081 : IBoily et all 120081 : lLowndes et"ail 120101 : IPickles et all l2010h . and to learn from it in order to in- 
form future large-scale interventions. A key part of such evaluations is examining how risk behaviours, 
chiefly condom use (CU), defined as the proportion of sex acts protected by condoms at a given time, 
have changed over the course of the intervention. However, this can be difficult to measure in prac- 
tice. Baseline CU may be difficult to record when an intervention needs to be implemented rapidly, as 
happened with Avahan, or may be recorded o nly on few occa s ions. While those targeted by the inter- 
vention may be asked about their CU history ( Lowndes et al. . 2010h . their answers may be subject to 
social desirabili ty and recall biases. In principle, the total number of condoms sold or distributed can 
be enumerated (IBradley et all I2010T) . but accurate records may not be available, condoms may be used 
for family pla nning by lower-risk individuals, and the distribution of condoms is not a guarantee of their 
correct usage (Br adley et all l2010l : Kumar et all 1201 lh . Thus, in addition to direct approaches through 
quantitative behavioural surveys or records of condom availability, model-based methods can be used to 
infer unobserved quantities of interest, such as CU, and complete the partial information available from 
observed quantities such as HIV prevalence, using know ledge of the dynami cs of large-scale epidemics. 
A first study in the context of Avahan was presented in lPickles et all (1201 01) . In this work, a determin- 
istic dynamic model for HIV/sexually transmitted infection was formulated based on a compartmental 
representation incorporating heterogeneous sexual behaviour. The model included various parameters 
for which informative prior dist ributions were used . Prior elicitation was based on various data sources, 
such as previous literature (see Pic kles et al. d2010h for more details) and serial cross-sectional surveys 
termed integrated behavioural and biological assessment (IBBA) conducted in the districts of India tar- 
geted by the intervention. The objective was to utilise this model and assess its ability to fit the available 

prevalence observations under three different hypothesised scenarios of evolution of C U. 

The work we present in this paper operates in the same context as in lPickles et al. d2010h . but focus 
is given on exploring the entire space of CU trajectories, rather than considering three scenarios regard- 
ing its evolution. Similarly, the model formulation can also be put in a state space setting where an 
underlying latent process (CU trajectory) is observed through the prevalence data, and the link between 
these quantities is given by the deterministic model for the HIV infections. Inference in this context is 
a challenging task given the limited amount of HIV prevalence data aside from initial conditions (three 
or four observations in total) that are concentrated over a period of 6 years, and are utilised to estimate a 
25-years long trajectory. Various models for the CU trajectories were considered, including smooth and 
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non-differentiable (yet continuous) choices. In the remainder of this paper, the term 'trajectory prior' is 
used to refer to these models in order to avoid confusion with the deterministic HIV model. Note how- 
ever that the trajectory priors include parameters for which there exist some information in the data. We 
also present a general and efficient computational sc heme using Markov Chain Mo nte Carlo (MC MC) 
techniques based on the particle MCMC algorithm dAndrieu et all 2010h : see bureau et all d2012h for 
an application in a similar context. Focus is given on estimating the amplitude of the change in CU 
since 2003 (the start of Avahan) in order to assess the impact of the Avahan intervention on CU. The 
properties of the estimators arising from the MCMC are studied via simulations, and the performance is 
assessed from a decision-making perspective through their sensitivity and specificity in detecting strong 
changes in CU. 

The next section presents the models introduced in this paper, the data that are typically available 
for such studies, and the way in which prior information is incorporated. The computational techniques, 
mainly the particle MCMC algorithm, are also presented. The developed methodology to compare the 
performance of the proposed trajectory priors is presented in Section [3j and the results from this study 
are introduced in Section|4]along with an application to real data from the Indian AIDS initiative Avahan. 
Finally Section [5] concludes with some relevant discussion. 



2 Models and Methods 

2.1 HIV transmission model for female sex workers 

We use a deterministic model of HIV transmission in a stable but open population of sex workers and 
their clients. The model structure accounts for high-risk and low-risk FSWs, who have different numbers 
of clients. This model is parameterised using data from serial cross-sectional bio-behavioural surveys 
(IBBAs) in Mysore district in southern India (Ramesh, et al., 2008). Some uncertainty remains about 
these biol ogical and behavioural p arameters, which is refl ected on the estima tes of CU using a Bayesian 
approach ( De Angelis et al. , 1998b . As motivated in lvickerman et al. ( 2010h . low -risk individuals unin- 



volved directly in sex work are ignored as they have little influence on the dynamics of the epidemic. 
Each individual in these three groups is either susceptible to HIV infection, infectious, or retired either 
due to death or ceasing commercial sexual activity. The flow-diagram corresponding to high-risk FSWs 
is shown in Figure 1 . In addition, individuals that either decease or stop being involved in commercial 
sex are replaced by susceptible ones, maintaining the population at risk at a constant size. As illustrated 
in the figure, the force of infection j8 is a function of a number of different parameters: 

• NbClients RR and NbClients LR : number of clients of FSWs per month, which differs for high risk 
and low risk FSW 

• NbEncounters: mean number of encounters with a FSW per client per month 

• NbActs: number of acts per client encounter 

• Pm^f or p'f^m '■ probability of HIV transmission from male to female or female to male respec- 
tively during an unprotected sex act 

• Cond e ff : efficacy of condoms in protecting against transmission of HIV per sex act 

• CU (?): proportion of FSWs' sex acts that are protected by condoms, that we allow to vary in time 
(parameter that we want to estimate) 

Additionally, the transmission dynamics of HIV will depend on the following lengths of time: 

• Hp 1 or jXp 1 : average length of sexual activity as a sex worker / client 
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Figure 1: Flow-diagram of the model for high-risk FSWs. Transmission dynamics for low-risk FSWs 
and clients are defined similarly 

• a~ l : average life expectancy with HIV 

In mathematical terms, the model can be defined with a set of differential equations: 
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with: 



P p i = [l - (l-p t ^ F ) mActs ]NbClients liR {l-Cond eff CU t ) 
j8 F2 = [1 - (1 - p%^ F ) NhActs ]NbClients LR {\ - Cond eff CU t ) 

RM _ ri I \ „tr \NbActsl I NbClients HR +NbClients LR \ Tot F r„„A rn \ 
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Propp 2 — N j, c i ients HR Tot ^ + NbOients LR Tot Fl 

In the model above, the state of the epidemic over time is described by the number of suscep- 
tibles among clients (Sm), among the low-risk female sex workers (Sf { ) among the high-risk female 
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sex workers (SpJ, and the corresponding numbers of infected individuals (HIVm, HIVp l and HIVp 2 ). 
Three types of constant parameters are involved: the initial prevalence among the different groups 
of interest in 1985 (d Lc . = {S Fl (1985), HIV F] (1985), Sp 2 (1985),#/V F , (1985), S W (1985),#/VV(1985)), 
time-invariant parameters describing the biological and behavioural determinants of HIV transmission 
(d tr , = {lAF,l*M,w,NbClients HR ,NbClients LR ,PM^ F ,pp^ M ,NbActs,Cond e ff}) and the parameters that 
play a role in the trajectory priors for CU between 1985 and 2010 (dcu)- All three components 9 ; -. c . , B tK , 
and dcu are integrated into a global vector of constant parameters, denoted by 6. Under this notation, the 
trajectory X\- n of the space vector X t = {Sp, (t),HIVp l (t),SF2(t), HIVp 2 (t),SM{t),HIVM{t)} is defined as 
a deterministic function of 8 and CU (X\- n = f[6,CU]) through an HIV transmission model, and is 
compared with the available observations, denoted by y\- n . Note that the function /(.) is not available 
in closed form but can be obtained given the trajectory of CU by solving the above ordinary differential 
equations (ODE). More specifically, we introduce a time discretisation with equidistant points of time 
step 8 resulting in a discretised skeleton of CU denoted CU discr = {CU to ,CU to+s ,CU to+2 s,-,CU tn }. The 
partition of the CU trajectory can be made arbitrarily fine by the user-specified parameter 8 to limit the 
approximation error induced by the time discretisation. 

Assigning a model for the observation error provides the likelihood of the observation prev° b * con- 
ditional on the CU trajectory p (prev° b ° \ 6 , CU ) . In this paper we use a binomial distribution, considering 
that prevalence estimates are derived from a random sample of 425 FSWs or clients in the Mysore dis- 
trict. More specifically if we denote prev model the value of the prevalence estimated through the HIV 
transmission model (prevf" del = (HlVp^tj) + HIVp 2 (ti))/2 if prevalence among FSWs is observed at 
time prevf odd = ///Vm(0 if prevalence among clients is observed at time ?,•), the distribution will be 
the following: 

425 x prevf s ~ Bin(425,prev'r de ') 

Overall, the model appearing of Figure [T] and Equations [T] and [2] is a simplified version of the one in 
in (Pickles, et al., 2010). This was done mainly for parsimony reasons; models of increased complexity 
can be used provided that there is adequate information on their parameters. More details on informative 
priors are provided in Section 1231 



2.2 Trajectory priors for condom use 



In this paper we introduce three different formulations for the evolution of the CU trajectory. Our first 
trajectory prior assigns a Brownian motion to CU, transforme d to take values in the real line, aiming to 
impose little restrictions to its shape. Initial considerations in IPickles et all d201fjh and classic literature 
on smoothly growing quantities also motivated the introduction of alternative formulations based on 
sigmoid-shaped growth curves. Hence, the second traje ctory pr i or, denoted by dBR, i s bas ed on the gen- 
eralised Bertalanffy-Richards model; see for example (IGarciaL 1 19831 : lYuancai et all I1997T). In order to 
enrich this context and address estimation issues that can be encountered with the dBR dLei and Zhangl 
2004), we also consider an alternative empirical sigmoid curve (dSigm). In what follows, we denote 
with x the latent process that drives the CU trajectory, which in turn provides the link with the preva- 
lence observations through the model in Section |2~T1 



2.2.1 Brownian motion (BM) 

The first formulation assigns a Brownian motion to a transformed version of the CU trajectory. As the 
latter has to be constrained in the [0,1] region, we work with the logit transformation of CU t , denoted 
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by: 



CU, 
dx, 



exp(x f ) 
1 +exp(x f ) 
odB, 



(3) 



The use of diffusion processes to describe time varying quantities in contexts associated with uncer - 
taint y has been used in epidemic models; see for example ICazelles and Chad (119970 . ICori et al.l (120090 
and Dureau et al. 1 d2012h . It can also be seen as a prior according to which x, is a random walk with 
continuous, yet non-differentiable trajectories. It is used here in an attempt to incorporate a limited 
amount of prior information on the shape of the trajectory. It can also be used as an exploration tool for 
potential modelling-remodelling steps towards more informative formulations. Variations of this formu- 
lation may include smoother diffusion models, by taking integrals of the Brownian motion, or alternative 
transformations such as the probit link. We note at this point that very little information is available on 
the volatility in © which is determined mostly by its prior. More details are provided in Sections 12.31 
and [3] 



2.2.2 Deterministic Bertallanfy-Richards function (dBR ) 

Quali tatively, CU trends reconstructions by alternative methods dLowndes et al. , 20 id : Bradley et al. , 
I2010T) suggest that CU was quite low in 1985, and has grown over the recent year. The above motivated 
the use of a growth curve parametric model instead of the Brownian motion diffusion. This is in line 
with various appro aches in modell i ng qu antities that a re smoothly growing in time i n different contexts 
such as biolog y (IZwietering et all 1 19900 . marketing (ILessne and Hanumaral Il988) and epidemiology 
( Ornran . 1971 ). We use the generalised Bertalanffy-Richards (BR) family ( Richards . 19591 : Garciai 
1983h that can be written as: 



CU t = 7](\-Be- kt ) 



or else, in differential equation framework: 

CU, = 
dxt = 



[{\-m)x t + r] l - m }T^ 
—kx t dt 



(4) 
(5) 



This family contains various growth curves, including the logistic (ra = 2) and Gompertz (m — > °°) 
functions. The growth curve can be parameterised by four quantities: the initial value of CUq, the time 
of inflection the value of CU after an infinite time (77, also termed as the asymptote), and the 'shape' 
or 'allometric' parameter ra. Note that the time of inflection can be related to the parameter k by the 
following equation: 



Jc X tjj 



log( 



B 



:) 



ra 



Furthermore, this definition implies that the initial value CUq is lower than ra i- ^rj. In order to focu s 



on sigmoid-shaped growth curves, we restrict our attention to cases where m > 1 (lYuancai et all 1 19970 . 
For illustration, the slope of the curve at its inflection point is a non-monotonous function of m: 



CU 
~dt 



{t in ) = nkm' 1 



'7 



log( 



l_(^L)(l_ m ) 



Ira 



in 



2.2.3 Deterministic empirical sigmoid curve (dSigm) 

An empirical sigmoid model is also considered to address the potential difficulties that can arise with 
the parameterisation of the dBR. Since growth models are used to study intrinsically growing objects, 
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trajectories that are inexplicably stable for a long period of time and that eventually start picking at a 
rapid pace are not typical unde r the BR formulation s. Moreover, inference on the allometric parameter 
m in dBR can be problematic (ILei and Zhangl. 120041) . These may lead to underestimating the amplitude 
of a shift in CU under the potential extrinsic influence of the Avahan intervention. For this reason, we 
also consider an alternative sigmoid curve, defined in the following way: 



CU, 
dx t 



a + 



c{l+x t ) 
kx t dt 



(6) 



Here the model is parameterised by its baseline (CUq ), its asymptote (77), its time of inflection f/,„), and 
the increase rate (r), from which a, b and c can be computed: 

a = CUq — b 

b = {r)-CU )c 
1 



1 



(7) 



The slope of the curve at its inflection point is now a simpler function of the model parameters: 

dCU tj-CU 

( tin ) - 



dt 



4r 



2.2.4 Stochastic growth curves 

It is also possible to combine the Brownian motion and the growth curve approaches using diffusions. 
Stochastic extensions of the dBR and dSigm model can be considered, in which the mean behaviour re- 
mains intact while some random perturbations are introduced through a stochastic differential equation. 
In order to ensure positivity, restrict CU t below one and retain the link with deterministic dBR curve, a 
geometric Brownian motion can be used to replace equations (f5]) and © 



dx t = —kx t dt + ox,dB t 



(8) 



The stochastic growth curve defined by (0]) and (HI) was also mentioned in lGarcial (Il983h . A convenient 
feature for both stochastic extension of dBR and dSigm is the fact that since 



1 



1 



m 



-{cu t 



1 — m 



y- m ), 



and x, is strictly negative, the resulting CU trajectory is maintained sttictly below v\ . Given the limited 
data at our disposal, these models can hardly be fitted in the context of this paper. Nevertheless, they 
may be helpful in cases where more observations are available. 



2.3 Priors 

The parameters contained in 0, f and 6 tr cannot be identified from the prevalence observations only, 
so we assign i nformative pr i ors on them. These are summarised in Table Q] and are similar with the 
priors used in iPickles et all (120 10l) . They were either based on previous literature regarding general 
quantities as transmission probability for unprotected acts, or life expectancy with HIV, whereas the 
ones concerning quantities that are more sociologically and geographically specific were estimated from 
cross-sectional individual-based surveys (IBBAs) in Mysore. 
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The parameter vector 9 includes an additional component, dcu, that contains the parameters for 
different models describing the CU trajectories, 6 = {di. c . , 6 tK , dcu }■ Although there is some information 
in the data for dcu, the posterior will depend on the prior to a large extent. As mentioned earlier, there is 
very little information on the volatility parameter of the BM formulation. Throughout this paper we used 
a Uniform prior between and 0.5. As explained in more detail in Section [3TT1 the parameter of main 
interest in this study is the quantity ACU = CU2009 — C^2003- Simulations suggest that, if we combine 
the BM approach with a Unif(0, 1) prior for CUq (CU in 1985), this results in a symmetric prior on 
ACU that is centered around with 2.5% and 97.5% points at ±0.6 respectively. We considered it as a 
reasonably vague prior for ACU and evaluated the performance of the resulting model via the simulation 
experiments of Section [3] More diffuse priors can also be used by setting a larger value for the upper 
limit of the Uniform prior for a. Regarding the parameters of the sigmoid curves, we used vague priors 
that are also shown in Table [J 



2.4 Computational schemes for implementation 



The joint posterior distribution can be obtained up to proportionality by the HIV transmission model of 
Section 12- 1 1 which links the prevalence observations with the CU trajectories, the trajectory priors of 
Section [2721 and the remaining priors of Section [231 For the dBR and dSigm trajectory priors, it can be 
put in a non-linear regression framework, with the non-linear function being the solution of the ODE , 



and c an therefore be implemented with standard software such as WinBUGS through WBDiff (ILunn 



2004). However this is not possible for the BM case where more involved techniques are required. Since 
the posterior probability density function is intractable, a data augmentation scheme can be utilised. This 
inference problem poses some challenges due to the high dimension of the discretised representation of 
CU\- n and its strong correlation with the vector of constant parameters, 6. This correlation imposes 
problems to Gibbs schemes on 6 and CU\- n , leadin g to extremely poor m ixing and convergence proper- 
ties. The Particle MCMC algorithm (PMCMC, see lAndrieu et all Jioioh ') algorithm offers a solution by 
updating the two components jointly, thus reducing the problem to a small-dimensional MCMC on 6. 
Implementation is based on the estimates of the likelihood p(prev b s \Q) that are provided by a particle 
filter. 

The particle filter and the PMCMC are described in Algorithms [T] and in terms of the quantities 
introduced in the previous sec ti ons. More details about this algorithm and its practical i mplementation 



can be found in iDureau et all (120121) (through an application in a similar context) and lAndrieu et al. 

teoioh . 



Algorithm 1 Particle Filter algorithm 

With N being the number of particles and n the number of observations. 

Initialise L°(0) = 1, W ; ' = jj, sample (Ct/^i,...^ from p(CU to \6) 
for i = to n — 1 do 
for j = 1 to N do 

Sample (Ct/-. i+1 ) from p(CU i>i+1 \e,CU/) 

Calculate the resulting prevalence prev\+° del by solving the ODE (for example with the Euler 
step) 

Set a 7 = p(prev°^\ \prevj^\ odel ) 
end for 

Set W/ +1 = ^- k , and L i+l (d) = V{Q) x^' 

Resample {CU\. i+l ,pl : ev\:'^ ,dd )j = \,,,,^ according to {w/ +l ), 
end for 
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Table 1: Table of priors for the different components of {9i, c ,9 tr ,9cu} 



HIV transmission model 
parameters definition 


Notation 


Range of uniform priors for the 
district of Mysore (Pickles et al. 2010) 


Probability of transmission 
from M. to F. per act 


Pm->f 


0.0006-0.0055 


Probability of transmission 
from M. to F. per act 


Pf^m 


0.0001-0.007 


Condom efficacy per act 


Cond e ff 


80%-95% 


Mean number of acts per clients 


NbActs 


1-2 


Mean number of clients per high-risk FSW 


NbClients HR 


46.6-54.0 clients/month 


Mean number of clients per low-risk FSW 


NbClients LR 


20-23.7 clients/month 


Toral number of FSWs 


Totp x + Totp 2 


2144 


Cliens/FSW population ratio 


Tot M 
Tot F{ +Totf 2 


7-19 


Mean length of sexual activity as FSW 




45-54 months 


Mean length of sexual activity as client 


Mm 1 


154-191 months 


Mean life expectancy after infection with HIV 


a- 1 


87-138.5 months 


Initial proportion of infected FSWsin 1985 


mv F[ +mv Fl 

Totfj +Totf 2 


0%-5% 


Initial proportion of infected clients in 1985 


HIV M /Tot M 


0%-5% 


Condom trajectory priors parameters definition 


Notation 


Prior 


Ailometric parameters (dBR) 


m 


^(l,10 6 )xl ]1)+oo[ 


Growth rate (dSigm) 


r 


^r(o,io 6 )x i ]1j+oo[ 


Asymptote (dBR, dSigm) 


n 


Unif -(0,1) 


Initial Value (all trajectory priors) 


cu t0 


Unif {0,1) 


Time of inflection (dBR, dSigm) 


Un 


Unif( 1985, 2009) 


Ailometric parameters, initial 
conditions and asymptote (dBR) 


(CU t0 ,ri,m) 


if CU t0 > mii?) 


Volatility (BM) 


a 


Unif "(0,0.5) 
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Algorithm 2 Particle MCMC algorithm (particle Marginal Metropolis Hastings version) 
With M being the number of iterations 
Set current 6 value, 6, to an initial value 

Use Particle Smoother (PS) according to Algorithm Q] to compute p(prev° h *\6) = L(6) and sample 

CU l:n from p(CU 1:n \y hn ,0) 
for It = 1 to M do 

Sample 6* from Q(6,.) 

Use Particle Filter to compute L(6*) and sample CU Vn from p{CU\ :n \prev^, 6*) 

Set 6 = 6* (and CJj\. n = CU°J with probability 1 A ^S§^S 
— e 

Record 6 and CU Un 
end for 



Regarding the choice of £?(.) in Algorithm |2j we use a random walk Metropolis-Hastings algorithm 
in a transformed parameter space (log or logit) to ensure positivity. Each iteration of the MCMC al- 
gorithm requires an execution of the particle filter, which induces substantial c omputational cost if the 
impo rtance sampling covariance matrix £ is ill-adapted. Adaptive approaches (IRoberts and Rosenthal , 



2009) can be used to tune £ but they require lengthy explorations of the target space. We propose to 
speed up this process by pre-exploration of a proxy posterior density p EKF (6\prev b s ) rely ing on a Gaus 



sian a pproximation of the dynamic system and the Extended Kalman filter methodology (IDureau et al. 



20121) . A simple bootstrap version of the particle filter is used as it is not straightforward to consider 



data-driven transition proposals given complex observation regime of our model. Note that given the 
short length of the observed time series, simpler alternatives to PMCMC may perform reasonably well. 
For example, the resampling step can be omitted and the particle filter output can be used to approximate 
p(CUi :n \prev°.*, 6). In our application however, it turns out that additional particles are needed for this 
approach, thus not offering a great reduction to the computational cost when compared to PMCMC. We 
therefore suggest the use of PMCMC as a robust computational tool that still does not requires a large 
amount of time in applications of this type. 



3 Evaluation methodology based on ensemble simulations 

Given the limited amount of information in available data (four or five prevalence observations, including 
initial conditions), it is very likely that the posterior output will be influenced substantially by the choice 
of CU priors and their parameters. In this section we explore the performance of the proposed inferential 
mechanism via simulation-based experiments designed to mimic the behaviour of datasets typically 
encountered in the context of application studied. Clearly, the approach of this paper heavily relies on 
the HIV infection model and the results will be quite sensitive to its specification. We therefore set up the 
simulation experiments under the assumption that the model of Section I2TT1 parameterised according to 
the priors of Section 12.31 is correct. Focus is given on quantities related with the CU trajectories, under 
the different choices of Section l2T2l that can be estimated from the samples of the posterior distribution 
provided by the MCMC algorithms of Section l2l4l We also provide some discussion regarding the static 
parameters appearing in the CU trajectory priors. 

3.1 Parameter of interest 

By fitting each of the previously introduced models we obtain samples from the marginal posterior 
density p meth {CU t \y) (meth € {dBR,dSigm,BM}). However, our interest mainly lies in the amplitude 
of the shift in CU between 2003 and April 2009 measuring the estimated increase in CU during the 
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intervention, henceforth denoted by ACU . The posterior draws of CU trajectories can be transformed to 
provide samples from the posterior of this parameter of interest. The samples can then be used to form 
an estimator ACU meth of ACU such as the posterior median of p meth (CU t \y). In what follows we explore 
the frequentist properties of this estimator derived from each of the trajectory priors. 

It may also be of interest to assess the estimating capabilities, given the limited amount of data, for 
the hyperparameters of the various CU priors (CUq, f],r,m, and a). As it turns out there is information 
for some of them {CUq, tj, ?,„), whereas some others are hard to estimate and are determined mostly by 
their prior (r, m and a). Nevertheless, from a subject matter point of view, interest lies mainly on 
ACU , whereas the remaining quantities (in CU priors) can be regarded as nuisance parameters. Another 
appealing feature of ACU is that it appears in all models and therefore provides an omnibus quantity for 
comparison. Hence, inference properties of these parameters {CUq, r\,r,m, t m and a) are only studied 
indirectly through inference properties of ACU . 



3.2 Measures of performance 

The performance of each estimator ACU meth in estimating ACU is evaluated from the following criteria 
(where L = 100, the number of simulations): 

Bias me,h = i Zi {ACU t meth ~ ACU t ) 

MSE meth = 1 £ . fi CU meth _ ACU( j 2 



Std meth = ^MSE meth - (Bias meth ) 2 

In addition to the quantities above we are also interested in assessing the discriminative ability of 
each model in detecting increases in CU. Focus is given to increases in CU that are at least as high as 
a pre-specified threshold T, which reflects the minimum practical increase. When analysing the data a 
researcher may decide that CU did increase more than T if the value of the estimator ACU meth is higher 
than a user-specified threshold t. Each decision mechanism may lead to different types of error and is 
therefore associated with a particular sensitivity and specificity. More specifically we can define the true 
and false positives in the following way 

• Sensitivity (true positives rate) for t: ^ ACU ^ j^j^fy > T - 

• Specificity (1 - false positives rate) for t: - ^ U ^cu'^t^ < T ' 

We proceed by first reporting sensitivities and specificities corresponding to the case of t = T . This 
corresponds to saying that ACU is higher than T if its estimator is higher than T. We then use a range of 
different fs and obtain the sensitivity-specificity pair that corresponds to each of them. A lower detection 
threshold t will increase the sensitivity of the method, but it also increases the risk for false positives, 
and vice versa. These pairs are combined to form the Receiver Operating Characteristics ROC curve 
by plotting sensitivity versus 1 -specificity. The area under the ROC curve (AUC) provides an overall 
measure of discriminatory power as it reflects the probability of correctly classifying a randomly chosen 
positive instance as higher than a randomly chosen negative one (Fawcett, 2006). For example, an AUC 
value of 50% indicates no power (i.e random choice) This detailed procedure is repeated to assess the 
ability to detect two different levels of increase in CU, with T set to 20% and 40% respectively. 



3.3 Simulation procedure 

The performance of the estimators derived from the different models is measured using a set of simu- 
lated experiments where CU trajectories are sampled from a given growth curve model, and parameters 
from di c and Q tr are sampled following their prior distributions. To maximise the utility of this test pro- 
cedure for future application of this methods to help evaluate Avahan in different districts (manuscript in 
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Figure 2: Simulation procedure, repeated 100 times for each trajectory prior 



preparation), only plausible and realistic CU trajectories are considered: cases with prevalence in 2010 
between 2% and 40% and with CU shifts that occurred after 1995. Furthermore, the test trajectories 
have been sampled so that ACU regularly spans the [0;0.6] interval. 

For each of these experiments, an epidemic is simulated to provide observations (yf m ) replicating 
the observation framework applied in Mysore: three prevalence estimates among female sex workers 
and one among clients, concentrated during the period of the intervention (step 2 of Figure [2]). From 
these observations, the MCMC algorithm is applied to each method meth to sample from p(CU"™ th \y s {"l) 
(step 3 of Figure[2]). Then, given the posterior CU samples the estimators ACU meth can be computed, and 
compared to their true counterparts ACU (step 4 of Figure [2]) by calculating the measures of performance 
of the previous subsection. Examples of ROC curves obtained in such manner are provided for the 
Brownian motion trajectory prior in Figure [3] 



4 Results 

4.1 Comparison of the CU trajectory models from ensemble simulations 

The results of the simulation experiments are presented in Tables [2] and [3] Table [2] focuses on the 
frequentist properties of the estimators, derived from the median of the posterior densities provided by 
each trajectory prior, and reports the bias, the standard deviation and the MSE of each estimator. Table 
[3] concentrates on the ability of the model to classify shift amplitudes of CU from 2003 to 2009 in the 
right order (AUCs), and more specifically on the risk of overstating versus understating the quantity of 
interest. In other words, we aim to address questions such as 'was the shift in CU during the intervention 
over 0.2 (0.4)?', via the corresponding sensitivity and specificity and the resulting AUC. 

The first table of this section (Table [2]) suggests that no model tends to consistently overstate ACU as 
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Figure 3: ROC curve when testing for ACU > 0.2 and ACU > 0.4, under Brownian motion trajectory 
prior. This curve was estimated from 100 simulations. Very similar shapes are obtained for the alterna- 
tive trajectory priors. 

all biases are negative. More precisely, the dBR model tends to strongly understate the shift in amplitude, 
by up to 0.23. The biases of the dSigm model is smaller (-0.17), but optimal results are obtained with 
the Brownian Motion model (-0.13). Similarly, in terms of MSE, the performance of the BM model 
is better. Figure |4] shows the bias, estimated from 100 simulations, of each model as a function of the 
true amplitude of the shift in CU. It suggests that the bias increases as a function of the size of the true 
amplitude of the shift in CU, and that the ranking of the different models is consistent across different 
configurations (from no shift in CU to moderate and high shifts in CU). If, for example, the true shift is 
50%, it is on average underestimated by 0.15 with the best method (BM) and more than 0.35 points with 
the BR method. 

Table [3] and Figure [4] suggest that all estimators based on the median of the posterior density of 
p(ACU\yi M ) have good distinguishing power: the AUC is between 0.82 and 0.91 in all cases. In line 
with the results of Table |2j the estimates provided by the BM model achieve better sensitivity (68% 
and 49%) than the other models (between 5% and 51%), and very good specificity (over 94%). The 
performance, particularly the sensitivity, decreases as the level of increase in CU that is being tested for 
increases. 

The results presented in these tables provide an informative qualitative assessment for the ability of 
the different models to capture ACU from limited prevalence data on an important and diverse set of 
likely scenarios (100 experiments). First of all MSE and AUC figures suggest that although the number 
of prevalence observations is low and some elements of the transmission process are uncertain, it is still 
possible to extract information on our time varying parameter and provide estimates of the amplitude of 
the shift in CU during the intervention. Furthermore, there seems to be a possibility to control the risk 
of overstating these quantities by analysing the outputs of the three models that offer different levels of 
compromise between sensitivity and specificity. Thus, although the procedure may fail to identify some 
shifts in CU, we have the reassurance that if it is detected it is likely to be true, which , in the context of 
interest, results in conservative estimates of intervention impact on CU trends. 

The bias in estimating ACU under each of the CU priors can be attributed to a large extent to the 
prior implied by each formulation on ACU. As mentioned in Section 1231 the BM approach results in 
a symmetric prior on ACU that is centered around with 2.5% and 97.5% points at ±0.6 respectively. 
The posterior median is therefore pulled towards resulting in conservative estimates. The amount of 
shrinkage depends on the upper limit of the Uniform prior on a. The corresponding priors under the 
dBR and dSigm formulations result it priors for ACU that put more mass around 0, although this heavily 
depends on the values of r and m that are hard to estimate. The resulting biases are therefore higher but 
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Figure 4: Bias of each model as a function of the true amplitude of the shift in condom use, estimated 
from 100 simulations. 



Table 2: Frequentist properties of the different estimators of the amplitude of the shift in condom use 
during the intervention, estimated from 100 simulations 





Deterministic 


Deterministic 


Brownian 




Bertalanffy- 


empirical 


motion 




Richards 


sigmoid 




Bias 


-0.23 


-0.17 


-0.13 


Error standard deviation 


0.16 


0.17 


0.17 


Mean Squarred Error (MSE) 


0.078 


0.0057 


0.045 



they have been obtained without placing informative priors on their hyperparameters, as was done with 
a under the BM formulation. 

The two models with the higher overall performance, BM and dSigm, are quite different in nature: 
the dSigm trajectories are smooth, whereas under the Brownian motion prior they are non-differentiable. 
Hence, the choice between the two models can also be based on prior beliefs of the researcher regarding 
the smoothness of the CU trajectories. 

4.2 Application: what can we infer on the trajectory of CU in Mysore? 

Mysore is one of the districts targeted by the Avahan intervention, and Avahan was the first HIV preven- 
tion intervention in this region. Four HIV prevalence estimates have been obtained between 2003 and 
2009, three among female sex workers, and one among clients. Results from the inference procedure 
using a Brownian motion model are shown in Figure [5] suggesting a strong impact of the intervention. 
The purpose of this paper was to assess what level of increase of CU between 2003 and 2009 can be 
inferred while controlling the risk of overstating it. As it was shown in section |4~T1 dSigm models could 
provide a good alternative to the BM formulation. Hence, we also present here results obtained with this 
model for the Mysore dataset (see figure [5]>. Table [4] shows the estimates of ACU for each of the three 
presented models. The results indicate a positive increase in all cases. In particular, for the BM and 
dSigm models the corresponding posterior means are 0.54 and 0.55 while the 95% credible intervals are 
[0.04;0.99] and [0.14;0.99] respectively. 

A stronger conclusion regarding a lower bound for the CU shift between 2003 and 2009 can be made 
by comparing the posteriors medians to the results of Table [3] If the underlying set of simulations is to 
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Table 3: General distinctive power (AUC) of the median estimator of the shift, and specific sensitivity 
and specificity when answering: is the shift in CU during the intervention stronger than 0.2? than 0.4? 
These quantities were estimated over 100 simulations. 
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Brownian 
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Empirical 


motion 
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AUC 


0.91 


0.9 


0.9 


ACU > 0.2? 


Sensitivity 


46% 


51% 


68% 




Specificity 


100% 


100% 


96% 




AUC 


0.85 


0.83 


0.82 


ACU > 0.4? 


Sensitivity 


5% 


38% 


49% 




Specificity 


100% 


95% 


94% 



Table 4: Estimates of the change in CU in Mysore between 2003 and 2009. 





Posterior 


Posterior 


95% credible 




mean 


median 


interval 


Deterministic Bertalanffy-Richards 


0.30 


0.28 


[0.11;0.73] 


ACU Deterministic Sigmoid 


0.53 


0.54 


[0.14; 0.99] 


Brownian motion 


0.52 


0.55 


[0.04; 0.99] 



be considered realistic, an argument in favour of a CU increase being at least 0.4 can be made. Since 
the posterior medians are more than 0.4 under both BM and dSigm models (.54 and .55 respectively), 
Table [3] suggests that a statement for ACU > 0.4 will be correct with probability given by the specificity 
of each model (94% BM and 95%-dSigm). While being more informative than the credible intervals 
obtained directly from the posterior densities (over 0.04 and 0.14 respectively with BM and dSigm), 
these numbers are heavily dependent on the assumption that the simulations of Section [3] provided an 
adequate approximation of the reality. 

Finally, Figure [5] and Table [4] show that the results obtained from the deterministic Sigmoid and 
Brownian motion models strongly coincide: they suggest that CU was stable over the 1985-2003 period, 
remaining below 0.5, sharply increased between 2003 and 2007, and stabilised between 0.8 and 0.9. 



5 Discussion 



In this article, we presented a Bayesian approach to draw conclusions regarding the evolution of time- 
varying behavioural parameters in the context of HIV such as CU among FSWs. Inference can be 
based on prevalence estimates while a substantial amount of information from additional sources can 
be incorporated via prior distributions. In order to describe the behaviour of CU trajectories we intro- 
duced three different formulations based on Brownian motion and growth curves such as the generalised 
Bertalanffy Richards and empirical sigmoid models. To our knowledge, these formulations are new 
in this context. The presented computational framework allows estimation of C U trajectories as we ll 
as functionals ther e of, us i ng advanced MCMC m ethods and following ideas of bureau et all (120120 : 
Cazelles and Chad (119970 : iRasmussen et al.1 (1201 lh . Nevertheless, in comparison to these approaches, 
the problem of evaluating the Avahan intervention by estimating its impact on CU from prevalence es- 
timates is of additional difficulty due to the limited amount of information; the application to Mysore 
district was based on three observations of prevalence among FSWs and one among clients, plus hypoth- 
esis on the initial value of prevalence in 1985. Various simulation experiments were conducted in order 
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Figure 5: Estimates obtained for Mysore district. 

a) reconstructed prevalence trajectory among female sex workers when condom use modelled with 
Brownian motion 

b) reconstructed prevalence trajectory among clients when condom use modelled with Brownian motion 

c) reconstructed condom use trajectory when modelled with Brownian motion 

d) reconstructed condom use trajectory when modelled with deterministic Sigmoid 
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to assess the validity of the procedure, examining the frequentist properties of the underlying estimators 
and the ability of the model to avoid overestimation via conducting ROC analysis. The evidence from 
the simulation experiments is encouraging, suggesting that the approach can be used in this context for 
making conservative estimates of changes in CU both with the Brownian motion and the deterministic 
sigmoid trajectory priors. However, the overall performance is bound to depend on the deterministic 
H IV infection model w hich was parameterised based on a substantial amount of prior information, as 
Pickles et al. d2010h . as well as on assumptions such as the very low HIV prevalence in 1985. Most 



in 



of the prior information utilised in this study was obtained from additional data sources (IBBAs). In 
the presence of all these data sources, it would be interesting to co nsider and con t rast a j oint inferential 
schem e through an evidence synthesis framework in the spirit of IGoubar et al. (l2008h : Presanis et al. 
d201lh . 

While the representation of HIV transmissions in this paper is simpler in behavioural terms in com- 
parison with the model presented in IPickles et all (l2010h . the model is enriched as it explores the CU 
trajectories space rather than working with three pre-determined scenarios. Nevertheless, there are rea- 
sons for a potential overestimation of the shift amplitude in this simpler model as coinfection with other 
sexually transmitted diseases were ignored (although higher transmission probability per unprotected 
act were allowed to compensate for the latter), and no acute phase was considered. However, diffusion 
driven models aim at capturing and compensating for structural mis-specifications while capturing the 
main dynamics of the system and have been shown here to provide conservative estimates. Overall it 
may be viewed as a different an d complementary c hoice in the trade-off between richness and tractabil- 
ity of the model compared to IPickles et all (120101) . Lastly, this approach relies on the hypothesis that 
changes in transmission probabilities are solely related to changes in CU, ignoring for example potential 
changes in the frequency of commercial sex partnerships. This choice can be motivated by the strong 
focus of the Avahan intervention on prevention measures and the relative stability in the frequency of 
commercial sex exhibited by the series of cross-sectional bio-behavioural surveys that were conducted 
during the period of the intervention. 
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